Rupprecht C, Laina I, DiPietro R, et al. Learning in an uncertain world: Representing ambiguity through multiple hypotheses[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 3591-3600.
1. Overview
1.1. Motivation
- uncertainty arises from the way data is labeled (label of occluded joints in pose)
In this paper
- reformulate existing single-prediction modles as multiple hypothesis prediction (MHP) modles (meta loss)
- MHP can expose valuable insights
- outperform SHP
- experiments on
- human pose estimation
- future frame prediction
- classification (multi-label)
- segmentation
1.2. Related Work
- Multiple Choice Learning
- Multi-label Recognition
2. Methods
2.1. SHP
2.2. MHP
2.2.1. Meta Loss Function
- M. the number of predictions (replicate output layer M times with differ init)
- Delta. True for 1, False for 0
- f_j. the j-th predictor among M predictors
Meta loss function can be regarded as original loss function weighted by Deltal
2.2.2. Procedure
- create M predictors, then forward each sample
- build y_i(x)
- compute gradient and update
2.2.3. Relax Delta
- solve the problem: predictor may be initialized so far from the target labels y that all y lie in a single Voronoi cell k
- Additionally, drop out predictions with some low probability (1%) to introduce some randomness in the selection of the best hypothesis, such that weaker predictions will not vanish during training
2.2.4. Hyper-parameter M
- almost every method that models posterior probabilities needs some form of hand-tuned model parameter (k-means, MDNs)
3. Experiments
3.1. Pose
- SHP. 59.7%
- 2-MHP. 60.0%
- 5-MHP. 61.2%
10-MHP. 62.8%
with increasing number of predictions the method is able to model the output space more and more-precisely
3.2. Future Frame Predictions
3.3. Classification
- if an image contains two bikes and a person, every time the image is sampled during training it will be labeled either as bike or person with 50%
3.4. Segmentation
- MHP (70.3%) vs MCL (69.1%)